A Link-Based Cluster Collection Approach Combined Contagious Cluster With For Categorical Data Clustering

نویسندگان

  • N. Premalatha
  • M. Chinnusamy
چکیده

Data clustering is a challenging task in data mining technique. Various clustering algorithms are developed to cluster or categorize the datasets. Many algorithms are used to cluster the categorical data. Some algorithms cannot be directly applied for clustering of categorical data. Several attempts have been made to solve the problem of clustering categorical data via cluster ensembles. But these techniques generate a final data partition based on incomplete information. The ensemble information matrix represents cluster relations with many unknown entries. The link based ensemble approach has been established with the ability to discover unknown values and improve the accuracy of the data partition. Besides clustering, similarity based ranking approach, HITS link analysis is also proposed to enhance the categorical results. This enhanced link-based clustering and ranking method almost outperforms both predictable clustering algorithms for categorical data and contagious cluster ensemble techniques for grade. KeywordsUncertainty; index; range aggregate N.Premalatha et al, International Journal of Computer Science and Mobile Computing Vol.2 Issue. 9, September2013, pg. 220-226 © 2013, IJCSMC All Rights Reserved 221 I.INTRODUCTION Data clustering is one of the challenging task in various applications. Data clustering is one of the fundamental tools to understand the structure of the data set. Clustering aims to categorize data into groups or clusters such that the data in the same cluster are more similar to each other than those in different clusters. Clustering is a data mining technique used to place similar data elements into related groups. A cluster is a collection of objects which are “similar” between them and are “dissimilar” to the objects belonging to other clusters. The notation of the cluster varies between different algorithms. The clusters found by different clustering algorithms are varying in their properties and structure. Clustering is used in many areas such as Statistical Data Analysis, Machine Learning, Data Mining, Pattern Recognition, Image Analysis, Bioinformatics, etc., The various clustering algorithms are Distance-based, Hierarchical, Partitioning, Probabilistic are proposed to cluster the datasets. These clustering algorithms are used to cluster the various data sets. Cluster ensembles provide a solution to challenges inherent to clustering. Cluster ensembles can find robust and stable solutions by leveraging the consensus across multiple clustering results. The cluster ensemble combines various clustering outputs into single consolidated cluster. The cluster ensemble will differentiate various cluster outputs by using the clustering algorithms. The main goal of ensembles has been to improve the accuracy and robustness of a given classification or regression task, and spectacular improvements have been obtained for a wide variety of data sets. Cluster ensemble methods are presented under three categories: Probabilistic approaches, Approaches based on co-association, and Direct and other heuristic methods. Categorical variables represent types of data which may be divided into groups. Examples of categorical variables are race, sex, age group, and educational level. Categorical data is a statistical data type consisting of categorical values used for observed data whose value is one of a fixed number of nominal categories, or for data that has been converted into that form. Categorical data are always nominal whereas nominal data need not be categorical. Clustering the categorical data is remaining a challenging task in many techniques. A critical problem in cluster ensemble research is how to combine multiple clustering’s to yield a final superior clustering result. These problems are overcome by using different techniques. The link based similarity is used to improve the clustering result.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Application of Combined Local Object Based Features and Cluster Fusion for the Behaviors Recognition and Detection of Abnormal Behaviors

In this paper, we propose a novel framework for behaviors recognition and detection of certain types of abnormal behaviors, capable of achieving high detection rates on a variety of real-life scenes. The new proposed approach here is a combination of the location based methods and the object based ones. First, a novel approach is formulated to use optical flow and binary motion video as the loc...

متن کامل

ارائه یک الگوریتم خوشه بندی برای داده های دسته ای با ترکیب معیارها

Clustering is one of the main techniques in data mining. Clustering is a process that classifies data set into groups. In clustering, the data in a cluster are the closest to each other and the data in two different clusters have the most difference. Clustering algorithms are divided into two categories according to the type of data: Clustering algorithms for numerical data and clustering algor...

متن کامل

خوشه‌بندی خودکار داده‌های مختلط با استفاده از الگوریتم ژنتیک

In the real world clustering problems, it is often encountered to perform cluster analysis on data sets with mixed numeric and categorical values. However, most existing clustering algorithms are only efficient for the numeric data rather than the mixed data set. In addition, traditional methods, for example, the K-means algorithm, usually ask the user to provide the number of clusters. In this...

متن کامل

A Thorough Investigation of Link-Based Cluster Ensemble Approach for Data Clustering

Clustering, in data mining, is useful to discover distribution patterns in the underlying data. Clustering algorithms usually employ a distance metric based (e.g., Euclidean) similarity measure in order to partition the database such that data points in the same partition are more similar than points in different partitions. The problem of clustering becomes more challenging when the data is ca...

متن کامل

Entropy-based Consensus for Distributed Data Clustering

The increasingly larger scale of available data and the more restrictive concerns on their privacy are some of the challenging aspects of data mining today. In this paper, Entropy-based Consensus on Cluster Centers (EC3) is introduced for clustering in distributed systems with a consideration for confidentiality of data; i.e. it is the negotiations among local cluster centers that are used in t...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2013